Search CORE

39 research outputs found

Database Technology for Processing Temporal Data

Author: Böhlen Michael Hanspeter
Dignös Anton
Gamper Johann
Jensen Christian Søndergaard
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH
Publication date: 01/10/2018
Field of study

Leveraging range joins for the computation of overlap joins

Author: Böhlen Michael Hanspeter
Dignös Anton
Gamper Johann
Jensen Christian S
Moser Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals

ZORA

Disjoint interval partitioning

Author: Böhlen Michael Hanspeter
Cafagna Francesco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/02/2017
Field of study

In databases with time interval attributes, query processing techniques that are based on sort-merge or sort-aggregate deteriorate. This happens because for intervals no total order exists and either the start or end point is used for the sorting. Doing so leads to inefficient solutions with lots of unproductive comparisons that do not produce an output tuple. Even if just one tuple with a long interval is present in the data, the number of unproductive comparisons of sort-merge and sort-aggregate gets quadratic. In this paper we propose disjoint interval partitioning (

\mathcal {DIP}

), a technique to efficiently perform sort-based operators on interval data.

\mathcal {DIP}

divides an input relation into the minimum number of partitions, such that all tuples in a partition are non-overlapping. The absence of overlapping tuples guarantees efficient sort-merge computations without backtracking. With

\mathcal {DIP}

the number of unproductive comparisons is linear in the number of partitions. In contrast to current solutions with inefficient random accesses to the active tuples,

\mathcal {DIP}

fetches the tuples in a partition sequentially. We illustrate the generality and efficiency of

\mathcal {DIP}

by describing and evaluating three basic database operators over interval data: join, anti-join and aggregation

ZORA

Computing the Fourier Transformation over Temporal Data Streams (Invited Talk)

Author: Böhlen Michael Hanspeter
Saad Muhammad
Publication venue: Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
Publication date: 19/10/2019
Field of study

In radio astronomy the sky is continuously scanned to collect frequency information about celestial objects. The inverse 2D Fourier transformation is used to generate images of the sky from the collected frequency information. We propose an algorithm that incrementally refines images by processing frequency information as it arrives in a temporal data stream. A direct implementation of the refinement with the discrete Fourier transformation requires O(N^2) complex multiplications to process an element of the stream. We propose a new algorithm that avoids recomputations and only requires O(N) complex multiplications

ZORA

Special issue on best papers of VLDB 2013 - Guest editorial

Author: Böhlen Michael Hanspeter
Koch Christoph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref

ZORA

Query Results over Ongoing Databases that Remain Valid as Time Passes By

Author: Böhlen Michael Hanspeter
Mülle Yvonne
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/04/2020
Field of study

Ongoing time point now is used to state that a tuple is valid from the start point onward. For database systems ongoing time points have far-reaching implications since they change continuously as time passes by. State-of-the-art approaches deal with ongoing time points by instantiating them to the reference time. The instantiation yields query results that are only valid at the chosen time and get invalidated as time passes by. We propose a solution that keeps ongoing time points uninstantiated during query processing. We do so by evaluating predicates and functions at all possible reference times. This renders query results independent of a specific reference time and yields results that remain valid as time passes by. As query results, we propose ongoing relations that include a reference time attribute. The value of the reference time attribute is restricted by predicates and functions on ongoing attributes. We describe and evaluate an efficient implementation of ongoing data types and operations in PostgreSQL

Crossref

ZORA

Temporal Query Processing

Author: Böhlen Michael Hanspeter
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

ZORA

Temporal Coalescing

Author: Böhlen Michael Hanspeter
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

ZORA

CORE: Nonparametric Clustering of Large Numeric Databases

Author: Böhlen Michael Hanspeter
Mazeika Arturas
Taliun Andrej
Publication venue: SIAM (Society for Industrial and Applied Mathematics)
Publication date: 01/01/2009
Field of study

Current clustering techniques are able to identify arbitrarily shaped clusters in the presence of noise, but depend on carefully chosen model parameters. The choice of model parameters is difficult: it depends on the data and the clustering technique at hand, and finding good model parameters often requires time consuming human interaction. In this paper we propose CORE, a new nonparametric clustering technique that explicitly computes the local maxima of the density and represents them with cores. CORE proposes an adaptive grid and gradients to define and compute the cores of clusters. The incrementally constructed adaptive grid and the gradients make the identification of cores robust, scalable, and independent of small density fluctuations. Our experimental studies show that CORE without any carefully chosen model parameters produces better quality clustering than related techniques and is efficient for large datasets

ZORA

Efficient evaluation of ad-hoc range aggregates

Author: Ammendola Christian
Böhlen Michael Hanspeter
Gamper Johann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/08/2013
Field of study

θ-MDA is a flexible and efficient operator for complex ad-hoc multi-dimensional aggregation queries. It separates the specification of aggregation groups for which aggregate values are computed (base table b) and the specification of aggregation tuples from which aggregate values are computed. Aggregation tuples are subsets of the detail table r and are defined by a general θ-condition. The θ-MDA requires one scan of r, during which the aggregates are incrementally updated. In this paper, we propose a two-step evaluation strategy for θ-MDA to optimize the computation of ad-hoc range aggregates by reducing them to point aggregates. The first step scans r and computes point aggregates as a partial intermediate result x̃, which can be done efficiently. The second step combines the point aggregates to the final aggregates. This transformation significantly reduces the number of incremental updates to aggregates and reduces the runtime from (∣∣r∣∣⋅∣∣b∣∣) to (∣∣r∣∣) , provided that ∣∣b∣∣<∣∣r∣∣‾‾‾√ and |x̃| ≈ |b|, which is common for OLAP. An empirical evaluation confirms the analytical results and shows the effectiveness of our optimization: range queries are evaluated with almost the same efficiency as point queries

ZORA